Simplest random iT-satisfiability problem 



rir. Federico Ricci-TersenghiS, Martin Weigti, and Riccardo Zecchinall 

□'□ The Abdus Salam International Centre for Theoretical Physics, Condensed Matter Group 
p Strada Costiera 11, P.O. Box 586, 1-34100 Trieste, Italy 

□ Institute for Theoretical Physics, University of Gottingen, Bunsenstr. 9, D-37073 Gottingen, Germany 

"o 

' We study a simple and exactly solvable model for the generation of random satisfiability problems. 

, These consist of 7A'' random boolean constraints which are to be satisfied simultaneously by TV logical 

' variables. In statistical-mechanics language, the considered model can be seen as a diluted p-spin 

^ ' model at zero temperature. While such problems become extraordinarily hard to solve by local search 

^ , methods in a large region of the parameter space, still at least one solution may be superimposed by 

^\ • construction. The statistical properties of the model can be studied exactly by the replica method 

' and each single instance can be analyzed in polynomial time by a simple global solution method. The 

geometrical/topological structures responsible for dynamic and static phase transitions as well as for 
the onset of computational complexity in local search method are thoroughly analyzed. Numerical 
analysis on very large samples allows for a precise characterization of the critical scaling behaviour. 
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I. INTRODUCTION 



Complexity theory 0], as arising from Cook's theorem of 1971 deals with the issue of classifying combinatorial 
optimization problems according to the computational cost required for their solution. The hard problems are grouped 
in a class named NP, where NP stands for 'non-deterministic polynomial time'. These problems are such that a 
Q potential solution can be checked rapidly whereas finding one solution may require a time growing exponentially with 
O ' system size in worst case. In turn, the hardest problems in NP belong to a sub-class called NP-complete which is at 
the root of computational complexity. The completeness property refers to the fact that if an efficient algorithm for 
' solving just one of these problems could be found, then one would have an efficient algorithm for solving all problems 
J>. , in NP. By now, a huge number of NP-complete problems have been identified , and the lack of an efficient algorithm 
T— I ' corroborates the widespread conjecture that no such algorithm exists, or more formally that NP^^P where P includes 
00 , all problems solvable in polynomial time. 

• Complexity theory is based on a worst-case analysis and therefore does not depend on the properties of the particular 
instances of the problems under consideration. In order to deepen the understanding of typical-case complexity rather 
than the worst-case one and to improve and test algorithms for real world applications, computer scientists have 
' recently focused their attention on the study of random instances of hard computational problems, seeking for a 
link between the onset of computational complexity and some intrinsic (i.e. algorithm independent) properties of the 
model. Analytical and numerical results have accumulated showing that the computationally hard instances 

appear with a significant probability only when generated near "phase boundaries" , i.e. when problems are critically 
I ■ constrained. This phenomenon is know as the easy-hard transition. 
■ O I Randomized search algorithms provide efficient heuristics for quickly finding solutions provided they exist. At the 
^ ■ phase boundary, however, there appears an exponential critical slowing down which makes the search inefficient for 
any practical purpose. Understanding the behaviour of search processes at the easy-hard transition point constitutes 
an important theoretical challenge which can be viewed as the problem of building a generalized off-equilibrium 
theory for stochastic processes which do not satisfy detailed balance. No static probability measure describing the 
asymptotic statistical behaviour of the search processes is guaranteed to exist. Moreover, the hardest random instances 
; ^ ' of combinatorial optimization problems provide a natural test-bed for the optimization of heuristic search algorithms 
Ci ^ which are widely used in practice. 

How to generate hard and solvable instances is far from obvious and very few examples of such generators are 
known In most cases, like e.g. in the random Boolean satisfiability problem (K-SAT ^,^), for a short definition 
see note [ |Tl| , hard instances can be only found in a very narrow region of the parameters space. In this region the 
probability that a random instance of the problem has no solution at all is finite. Then, heuristic (incomplete) search 
algorithms have no way to disentangle, in a given finite time, the unsatisfiable instances, from those which are simply 
very hard to solve. 

In this paper, we shall discuss a very simple and exactly solvable model for the generation of random combinatorial 
problems. On one hand, these become extraordinarily hard to solve by local search methods in a large region of the 
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parameter space and yet at least one solution may be superimposed by construction. On the other hand, the model 
may be solved in polynomial time by a simple global method and therefore belongs to the class P. 

At variance with respect to the famous random 2-SAT problem , which is in P and can be solved efhciently by 

local search methods also at the phase boundary, the model we consider undergoes an easy-hard transition very 
similar (even harder) to the one observed in 3-SAT as far as local search methods is concerned. However, the exact 
mapping of the model on a minimization problem over uniform random hyper-graphs makes the problem analytically 
tractable. It is also solvable in polynomial time by a global method which allows for the numerical study of very large 
systems. Therefore, some of the open questions which arise the the analysis of 3-SAT and which are common to the 
present model can be answered exactly. 

In the context of statistical physics the model provides a simple model for the glass transition, in which the 
crystalline state can be view as the superimposed solution and the structure of the excited states is responsible for 
the off-equilibrium behaviour and the associated structural glass transition. These aspects will be the subject of a 
forthcoming paper. The limit of infinite connectivity provides one of the most studied models in the context of spin 
glass theory, see for instance . 

II. THE MODEL 

In order to unveil the different aspects of the model, to be referred to as hyper-SAT (hSAT), we give explicitly its 
definition both as a satisfiability problem and as a minimization problem over hyper-graphs. 

Here we discuss the hSAT model with K — 3 variables per constraint, which can be viewed as a perfectly balanced 
version of the famous random 3-SAT problem. The case K — 2 does not present any interesting computational features 
as far as hardness is concerned because it can be solved efficiently both by local and global methods. Generalizations 
to if > 3 are straightforward. 

Given a set of N Boolean variables {xi ~ 0, l}i=i,...,Ar, we construct an instance of 3-hSAT as follows. Firstly we 
define the following elementary constraints (4-clauses sets with 50% satisfying assignments) 

C{ijk\ + 1) = {xi V Xj V Xk) A {xi V Xj V Xk) A {xi V Xj V Xk) A {xi V Xj V Xk) 

C{ijk\ — I) = {xi V Xj V Xk) A {xi V Xj V Xk) A {xi V Xj V Xk) A {xi V Xj V Xk) , (1) 

where A and V stand for the logical AND and OR operations respectively and the over-bar is the logical negation. 
Then, by randomly choosing a set E of M triples k} among the N possible indices and M associated unbiased 
and independent random variables Jijk = ±1, we construct a Boolean expression in Conjunctive Normal Form (CNF) 
as 

F= /\ Ciijk\J,,k) ■ (2) 
{ij,fc}eB 

A logical assignment of the {xi}s satisfying all clauses, that is evaluating F to true, is called a solution of the 3-hSAT 
problem. If no such assignment exists, F is said to be unsatisfiable. 

A slightly different choice of Jijk allows to construct hSAT formulas which are random but guaranteed to be 
satisfiable. To every Boolean variable we associate independently drawn random variables Si — ±1, and define Jijk ~ 
EiEjEk for all {i, J, k} G E. For this choice, CNF formula (||) is satisfied by {xi \xi = -\-\ if Si = -1-1, Xi = \i Si = — 1}. 
As we shall discuss in great detail, these formulas provide a uniform ensemble of hard satisfiable instances for local 
search methods. We refer to this version of the model as the satisfiable hSAT. Indeed, the random signs of Jijk can 
be removed in this satisfiable case by negating all Boolean variables Xi associated to negative Si . The resulting model 
has Jijk = +1 for all {i, j, fc} g E, and the forced satisfying solution is Xi = 1, Vi = 1, N . The use of the {ci} is 
a way of hiding the latter solution by a random gauge transformation without changing the properties of the model. 
The impossibility of inverting efhciently the gauge transformation by local methods is a consequence of the branching 
process arising form the presence of if = 3 variables in each constraint. For any if > 3 the same result would hold 
whereas for if = 2 the problem trivializes. 

The hSAT model can be easily described as a minimization problem of a cost-energy function over a random 
hyper-graph. Given a random hyper-graph Qn,m = {y^E), where V is the set of N vertices and E is the set of M 
hyper-edges joining triples of vertices, the energy function to be minimized reads 

Hj[S\=M- J^lkS,S,Sk , (3) 

{t,j,k}eE 
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where each vertex i bears a binary "spin" variable Si = ±1, and the weights Jijk associated to the random bonds can 
be either ±1 at random, in the so called frustrated case, or simply equal to 1 in the unfrustrated model. 

Once the mapping Si = 1 if = 1 and S"; = — 1 if = is established, one can easily notice that the energy 
function in Eq.(y) simply counts twice the number of violated clauses in the previously defined CNF formulas with 
the same set of J's. The frustrated and the unfrustrated cases correspond to the hSAT and to the satisfiable hSAT 
formulas respectively. 

The computational issue consists in finding a configuration of spin variables which minimizes H. If all the terms 
JijkSiSjSk appearing in the energy are simultaneously maximized ("satisfied") the energy vanishes. This is always 
possible in the unfrustrated case just by setting Si = 1, Vi. In the frustrated case, there exist a critical value of 
the average connectivity above which the various terms start to be in conflict, that is frustration becomes effective 
in the model. In random hyper-graphs the control parameter is the average density of bonds, 7 = M/N (or, for 
the CNF formula, the density of clauses a = 47). For sufficiently small densities, the graph consists of many small 
connected clusters of size up to O(lniV). If 7 increases up to the percolation value 7^ = 1/6, there appears a giant 
cluster containing a finite fraction of the N sites in the limit of large N. However, also this cluster can a priori 
have a tree-like structure, for which the randomness of the couplings Jijk = ±1 can be eliminated by a proper gauge 
transformation. Si ±5^', of the spin variables. As we shall see, there exist two other thresholds of the bond density 
at which more complicated and interesting dynamical and structural changes take place. 

In spite of apparent similarities, hSAT and the random Boolean Satisfiability problem (K-SAT jll]!) differ in some 
basic aspects. 

In K-SAT the fluctuations of the frequencies of appearance of the variables in the clauses lead to both single and 
two body interactions in the associated energy function pO[ which force the minima in some specific random directions 
and which rule out the existence of a purely dynamical threshold (see below). Algorithms may take advantage of 
such information and both heuristic as well as complete algorithms show a performance which indeed depends on the 
criterion used to fix the variables. For example, rigorous lower bounds to the critical threshold have been improved 
recently by exploiting this opportunity in a simple tractable way On the same footing, the efficiency of the 

most popular heuristic and complete search algorithms, namely walk-sat and TABLEAU |g3|, is again based on 
strategies which exploit the above structure. Note that the above improvements cannot be applied to the hSAT model 
where formulas are completely balanced. 

Moreover, in K-SAT the mapping of the problem over directed random graphs is rather involved and the exact 
analytical solution is still lacking, while in hSAT the connection to random hyper-graphs is clear and makes the 
analysis tractable. 

Finally, restriction of K-SAT to satisfiable instances (for instance by selecting at random clauses which are satisfied 
by a previously fixed assignment of variables) does not provide a uniform ensemble of hard satisfiable problems even 
when restricted to local search methods ||2^ . 

Given the mapping over random hyper-graphs, the satisfiability problem for hSAT can be solved in 0{N^) steps by 
simply noticing that the problem of satisfying all constrains is nothing but the problem of solving an associated linear 
system modulo 2, i.e. in GF[2]. Upon introducing the two sets of binary variables {a.;} £ {0, 1}^ and {bijk} S {0, 1}*^ 
such that (—1)°' = Si and (— 1)''*^'= = Jij^, the hSAT decision problem becomes simply the problem of determining 
the existence of a solution in GF[2\ to the random linear system + aj + ~ bijki'oiod 2), with ijk running over 
all triples. 

Finally, we notice that in the high 7 UNSAT (or frustrated) region the optimization problem of minimizing the 
number of violated constrains, the so called MAX-hSAT, is indeed computationally very hard both for complete and 
incomplete algorithms and no global method for finding ground states is available. 

III. OUTLINE OF MAIN RESULTS 

For the sake of clarity, we anticipate here the main results leaving for the following sections a thorough discussion 
of the analytical and numerical studies. 

The frustrated hSAT model presents two clear transitions. The first one appears at 7^ = 0.818 and it is of purely 
dynamical nature. There the typical formula still remains satisfiable with probability one, but an exponential number 
of local energy minima appear at positive energies. Deterministic algorithms, like greedy search or zero temperature 
dynamics, try to decrease the energy in every step and thus get stuck at least at this threshold. Randomized 
algorithms may escape from these minima, but they undergo a slowing down from an exponentially fast convergence 
to a polynomially slow one, i.e. at the typical time for finding a solution diverges as a power of the number of 
variables. The dynamical transition at 7^ seems to be accompanied by a dynamical glassy transition due to replica 
symmetry breaking (RSB) effects connected with the appearance of an exponential number of local minima. An 
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approximate variational calculation (see ref. for a discussion on the method) involving RSB gives j2°rsb — 0-83 
which is in good agreement with the value of 7^ where local minima appear. 

The second transition appears at jc = 0.918 and corresponds to the so-called SAT/UNSAT transition (below -fc 
the typical problem is satisfiable whereas above 7c it becomes unsatisfiable). At this point the structure of the global 
energy minima changes abruptly. The ground states have strictly positive energy, thus no satisfying assignments for 
the hSAT formula exist any more. While the number of these configurations is always exponentially large (the ground 
state entropy is always finite), at 7c a finite fraction of the variables, the so-called backbone component, becomes 
totally constraint, i.e. the backbone variables take the same value in all minima [2^ . An important difference of 
the SAT/UNSAT transition in hSAT compared to K-SAT is the non-existence of any precursor. For 7 < 7c 
and large N, all variables Si take equally often the values -1-1 and —1 in the ground states (they have zero local 
magnetization), even those which become backbone elements when 7c is reached by adding new 4-clauses sets. The 
lack of any precursor comes from the non-existence of single- or two-body interactions in Eq.(|^). 

The unfrustrated or satisfiable hSAT problem has by construction at least one solution which we find to be su- 
perimposed without affecting the statistical features of the model for 7 < 7c in the limit of large N, including the 
dynamical transition at jd — 0.818. It is impossible to get any information on the superimposed solution by looking at 
the full solution space, because it is completely hidden by the exponential number of ground states. Randomly chosen 
satisiying assignments do not show any correlation. At exactly the same 7c = 0.918 as in the frustrated model, there 
appears a transition from a SAT phase with exponentially many unbiased solutions to another SAT phase where the 
solutions are strongly concentrated around the superimposed solution. The latter one is now hidden by the presence 
of exponentially many local energy minima with positive cost. These minima have exactly the statistical properties 
of the global minima of the corresponding frustrated hSAT problem, that is the hSAT problem defined over the 
same hyper-graph but with randomized signs of the couplings Jijk- More specifically, the energy, the entropy and 
the backbone component size coincide. Due to their finite entropy, an algorithm will thus hit many of these local 
minima before it reaches the satisfying ground state. As one can see from Fig. ^ finding this solution by backtracking, 
e.g. with the Davis-Putnam (DP) procedure |2^, is nevertheless easier than proving the unsatisfiability of hSAT (or 
identifying ground states in the frustrated version). This results stems from the missing information on the true 
ground state energy of hSAT above 7c. The solution time is however found to be clearly exponential in both cases. 
In the 7 > 7c region, the model provides a uniform ensemble of hard SAT instances for local methods which can be 
used to test and optimize algorithms. 



IV. STATISTICAL MECHANICS ANALYSIS: THE REPLICA RESULTS 



In our analytical approach, we exploit the well known analogies between combinatorial optimization problems and 
statistical mechanics. In both cases, the system is characterized by some cost-energy function, as it is given e.g. by 
Eq.(|) for hSAT. In equilibr ium statistical mechanics, any configuration S = {Si}i^i,,,N is realized with probability 
exp{—(3H[S]}/Z where f3 — 1/T is the inverse temperature and Z the partition function. If the temperature is 
lowered, the probability becomes more and more concentrated on the global energy minima and finally, for T ~ 0, 
only the ground states keep non-zero weights. 

In order to compute the average free energy, we resort to the replica symmetric (RS) functional replica method 
developed for diluted spin glasses which is known to provide exact results for ferromagnetic models: To circumvent the 
difficulty of computing the average value of In Z, we compute the n*^ moment of Z for integer- valued n and perform 
an analytical continuation to real n to exploit the identity In <C Z" ^= 1 -I- n <C InZ ^ +0{n'^). The n*'' moment 
of Z is obtained by replicating n times the sum over the spin configuration and averaging over the disorder p8[ 

<Z">= <cxp [-/3^i/j[S"] ] > , (4) 

S1,S2,...,S" \ a=l / 

which in turn may be viewed as a generating function in the variable exp(— /3). 

In order to compute the expectation values that appear in Eq.(^), one notices that each single term 
exp(— /3^"^-^ 7?j[S"]) factorises over the sets of different triples of indices due to the absence of any correlation 
in the probability distribution of the Jijk- It follows 

<Z">= ^ exp J -/377Vn-77V+^^e'^Sa^^'°^^°^^° +0(1) i (5) 

s^s^...,s'" I ijk I 
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The averaged term in Eq.(||) depends on the n x N spins only through the 2" occupation fractions c{a) labelled by 
the vectors a with n binary components; c{a)N equals the number of labels i such that Sf = a"', Va = 1, . . . ,n. 
Therefore, the final expression of the n*'' moment of Z to the leading order in N (i.e. by resorting to a saddle point 
integration), can be written as ^ Z" exp(iV F[c]) where F[c] is the maximum over all possible c(f7)'s of the 
functional 

-/3F[c]=-7(l+/3n)-^c(<T)lnc(a)+7^c(a)c(p)c(f)exp(/3^aVT'^) • (6) 



The saddle point equation ^q^^^'' A — 1 reads 

c(a) = exp J -A + 37 ^ c(p)c(f) exp(/3 ^ aVV'^) I (7) 

{ pr a j 

where the Lagrange multiplier A enforces the normalization constraint '^gc{<7) — 1, and goes to 87 for n — > 0. In 
Eq.(^, one may easily identify two terms, one model dependent and the other (— ^((t) In c((t)) simply describing 
the degeneracy (the so called combinatorial entropy) with which each term of the generating function appears given 
the representation in terms of the occupation fractions. In the limit of interest T ^ and in the replica symmetric 
subspace, the freezing of the spin variables is properly described by a rescaling of the local magnetizations of the form 
m = tanh(/3ft-). The probability distribution P{h) is therefore introduced through the generating functional 

c(a) = / dh P(h)- — — (8) 

^ ' J-00 ^ ^(2cosh(/3/i))« ^ ' 

where h is nothing but an effective field in which the spins arc immersed, c depends on a only via s — ct°. In this 
representation, the free-energy reads 

- PF[Pih)] = 

2cosh(/3(/ii +/i2))[e'"'^' +e-2/3-/3''3] _^2cosh(/3(/ii - /i2))[e"'"*^' 



7 dhidh2dh3Pihi)P{h2)P{h3)ln- 



(2 cosh(/3/ii ) ) (2 cosh(/3/i2) ) (2 cosh(/3/i3) ) 



^^^e'''^ Pft{K)[1 - lnPj.T(^)]ln[2cosh(/3;i)] (9) 
where Pft{K) is the Fourier transform of P{h). The associated saddle-point equation reads 

dhP{h)e'^''' ^expl^-3j + 3j J dhidh2P{hi)P{h2)G{hi,h2)^ , (10) 
where 

/ cosh(/3(fei+/i2))+e-^^cosh(/3(/n-fe2)) y , . 

^ ^' \cosh{P{h, - h2)) + coshiPihi + h2)) J ■ ^ ' 

In the case of satisfiable hSAT, at /3 ^ cx) (T = 0) and in the version having no random gauge {Jijk = +1, V{i, j, k} G 
E), the spins turn out to be subject to an effective local field h which fluctuates from site to site according to the 
following simple probability distribution 

P{h)=Y,pi^'^Sih~i) (12) 



e>o 



with the saddle-point conditions 

pii) ^ (37)^ tP^l^ 

p(°)=exp{-37(l-pf )n (13) 

The above structure is not surprising for a ferromagnetic model since 1 — p^'^'' is nothing but the fraction of sites which 
have non-vanishing field and are therefore totally magnetized. The saddle-point equations simplify once rewritten in 
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terms the probability distribution P{m) of the local magnetizations m; = 0, 1 which takes the particularly simple 
form 

P{m) = p(o)(5„,,o + (1 - P^"))<5„M, (14) 

where S.,. is the Kronecker-symbol. Thus, a fraction 1 — p"^^ of all logical variables is frozen to +1 in all ground states, 

whereas the others take both truth values with same frequency. The self-consistent equation for p"-^^ in Eq. ( |l3|) can 
be rewritten as 

p(0)^ge-37(^(l_(l_p(0))2)c (15) 
c=0 

and can be justified by simple probabilistic arguments: A variable is frozen if and only if it is contained in at least 
one hyper-edge k} d E where also the two neighbors are frozen. Thus a variable is unfrozen, mi = 0, if and only 
if every adjacent hyper-edge contains at least one more unfrozen variable. For a spin of connectivity c, this happens 
according to Eq.Q with probability (1 - (1 - The average over the Poisson-distribution e of 

connectivities c results in the total probability for a variable to be unfrozen, so Eq.(|l^) follows. As an additional 
result of replica theory, we derive the ground-state entropy 

s(7) = ^lirn^ ^ InNgs - ln(2) (pf (1 - Inpf) _ ^[i „ (1 . (16) 

For small 7, Eq.(p^) has only the trivial solution p-y — I where all variables are unfrozen, i.e. rrii — for all i. 
No internal structure is found in the set of satisfying assignments and, choosing randomly two of them, they have 
Hamming distance 0.5iV + 0{-\fN). To leading order in iV, the M 4-clauses sets act independently, each dividing the 
number of satisfying assignments by two, i.e. Mgs — 2^^^^''''. This is a clear sign that the structure of the hyper-graph 
is still tree-like. 

At 7c; = 0.818, a new solution of Eq.(^5|) appears discontinuously, having a fraction (1 — Pt*"*) — 0.712 of completely 
magnetized variables. This transition can be seen as a percolation transition of fully magnetized triples of connected 
variables. The entropy of this solution remains however smaller than the entropy 1 — 7 of the paramagnetic solution, 
thus the total solution space is still correctly described by TOj = for alH = 1, ...,iV. The appearance of the new 
solution signals however a structural change in the set of solutions which breaks into an exponential number of clusters. 
The cluster containing the imposed solution Xi ~ +\ is described by the new meta-stable solution. 

Another important difference to the low-7 phase is an exponential number of local minima of the energy function 
showing up at 7^. These have positive energies, and the corresponding logical assignments do not satisfy the hSAT 
formula. Algorithms which decrease the energy in every time step by local variable changes, like e.g. zero-temperature 
Glauber dynamics or greedy algorithms, get almost surely trapped in these states and do not find a zero-energy ground 
state for 7 > 7^. Randomized algorithms may escape from these minima, but as found numerically, this causes a 
polynomial slowing down. 

By increasing 7 above 7^, the number of ground-state clusters decreases further. At 7c = 0.918 all but one ground- 
state clusters disappear, and the non-trivial solution of (|l^) becomes the stable one. So only the cluster including the 
imposed solution survives, it still contains 2" °*^^ solutions, but 88.3% of all variables are fixed to +1, thus forming 
the backbone which appears discontinuously. As is known from Ref. 0, the existence of an extensive backbone is 
closely related to the exponential computational hardness of a problem. The remaining 11.7% of un-frozen variables 
change their values from ground-state to ground-state. They are contained in small disconnected components or 
dangling ends of the hyper-graph. 

The behavior of frustrated hSAT is similar, as given both by numerical analysis and by RS or variational RSB cal- 
culations. We find that the solution Xi — (and its corresponding cluster) in satisfiable hSAT are just superimposed 
to the solution structure of random hSAT. Thus, the statistical properties of the solutions do not change for 7 < 7c, 
including also the clustering of solutions above 7^. At 7c — 0.918 the model undergoes a SAT/UNSAT transition, and 
the solution entropy jumps from 0.082 down to minus infinity. The variational RSB calculation gives a value for the 
dynamical critical connectivity ^Yrsb — 0-83 which is close to the exact value 0.818. This result gives evidence for the 
validity of the variational approach in the region where local minima first appear, i.e. where the result does not depend 
strongly on the specific functional Ansatz made for the RSB probability distributions. For the SAT/UNSAT static 
transition the predictions of the variational RSB analysis can be strongly affected by the restriction of the functional 
space which does not necessarily match the geometrical structure (clustering) of the space of solution. However, in 
the case of hSAT the results are still in good agreement, we find 7c "s^, — 0.935. 
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V. CONNECTION WITH GRAPH THEORY 



In the hSAT model, we are able to extract exact results - without the need of RSB - by identifying the topological 
structures in the underlying hyper-graph which are responsible of the SAT/UNSAT transition (or of frustration and 
glassiness). The presence (or the absence) of such topological structures in the hyper-graph drastically changes the 
statistical mechanical properties of the model. The different phase transitions can be viewed as different kinds of 
percolation in the random graph theory language . 

We have already seen that the formation of a locally stable ferromagnetic state in unfrustrated hSAT at 7^ = 0.818 
can be understood in term of percolation arguments. The same arguments reveal that at 7^ many metastables states 
appear in both versions of the model, giving rise to a dynamical transition. 

In order to understand what happens at the critical point 7c we need to introduce the notion of hyper-loops^ that 
is the most natural generalization of the usual loop to hyper-graphs whith multi-vertex links. Given a random graph 
Q = (V, E), where V is the set of vertices and E is the set of (hyper-)links, a hyper-loop can be defined as a non-zero 
set of (hyper-)links, R C E, such that the degree of the subgraph C = {V, R) is even, i.e. every vertex belongs to an 
even number of (hyper-)links (including zero). In Fig. Q (left) we show the smallest hyper-loop in a K — 3 random 
hyper-graph. Note that in random hyper-graph typical hyper-loops are very large and the one shown in Fig. |l| (left) 
is extremely rare for N large. 




FIG. 1. The simplest hyper-loop (left) and the hyper-loop with one totally constrained vertex of odd degree (right) |jO[ . 
Triangles represent the interaction between the three spins located at the vertices. The black dot represents the constrained 
spin residing on the odd-degree vertex of the hyper-loop. 

In a similar way we can identify those vertices which are totally constrained. A set of (hyper-)links, T^^'^ C E, 
constrains completely the spin at site i if in the sub-graph = (V,T(*)) the vertex i has an odd degree and the 
remaining vertices an even one. In Fig. |l| (right) we show the smallest of such structures. 

In a zero-energy configuration (SAT assignment) we have SiSjSk — Jijk,y{i, j, k} € E. Then, given any hyper-loop 
R, we conclude 

n j^ik= n s,sjSk^i , (17) 

{i,i,fc}e-R {i,j,k}<^R 

where the second equality comes from the fact that in the second product every spin appears an even number of times. 

In frustrated hSAT the couplings are randomly fixed to ±1 and, consequently, the first product in Eq.(|l7|) is equal 
to -1 with probability 1/2. Then we can conclude that as soon as one hyper- loop arises in the hyper-graph half the 
formulas become unsatisfiable. In general, given a hyper-graph with N^i hyper-loops, the fraction of SAT formulas 
(with that given hyper-graph) is 2"^'*' . Still one needs to average this fraction over the random hyper-graph in order 
to obtain the right fraction of SAT formulas. 
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We have numerically found that at the critical value 7c = 0.918 the percolation of hyper-loops takes place, that is, 
in the large N limit, the average number of hyper-loops Nhi{"f) is zero for 7 < 7c and 0{N) for 7 > 7c. This is the 
direct explanation of the SAT/UNSAT transition in terms of hyper-graph topology. 

In the unfrustrated model Jijk — 1 and Eq.(|l7|) is always satisfied. However, the mean number of hyper- loops 
Nhiil) is related to the entropy of satisfying assignments through 5(7) = (1 — 7 -I- Nhi{j)/N) In 2. The derivation of 
this equality straightforward if we consider the linear system modulo 2 of M equations in N variables, introduced at 
the end of section ||. In terms of the linear system hyper-loops represents combination of equations giving a trivial 
one (e.g. = 0) which does not fix any degree of freedom. The entropy, which is proportional to the number of degree 
of freedom, is then given by S{-/) = ln(2)(iV - M - Nhiil))/N. 

Considering now a totally constrained spin at site i and a SAT assignment, we have that 

Y[ Jijk = W SiSjSk — Si . (18) 
{i,],k}eTW {i,jM}eTW 

Then, in every SAT formula, hyper- loops with one odd-degree vertex (to be denoted by the label hi — 1) fix one spin 
variable to a complicated function of the couplings. We have numerically checked, that such structures arise at 7c, 
like hyper-loops, but in a discontinuous way. 

In satisfiable hSAT we have Jijk = 1, so that any independent hyper- loop with one odd degree vertex fixes 
one spin to 1 Then the magnetization of the model is equal to the mean density of such loops, m{j) = 

Phi-iil) — Niii„i('j)/N. Because of the discontinuous nature of the transition the limits lim _^ - pui-iil) ^ 
lim^^^+ phi-i{'~i) = TOc = 0.883 do not coincide. 

In frustrated hSAT Eq.(|T^) fixes the variables belonging to the backbone. Then one would be tempted to relate 
the backbone size to the density Phi-i{l) of hyper-loops with one frozen vertex (which is true) and to estimate the 
backbone size at the critical point to be 88.3% (which is not true). Indeed at the critical point there is a coexistence 
of SAT and UNSAT formulas (see next section) and for 7 > 7c all the formulas become UNSAT in the large N 
limit. Then Eq.(|l8|) can be applied only for 7 < 7c where the density pni-i goes to zero when N — > 00. While the 
appearance of the backbone is necessarily related to the presence of hyper-loops with frozen vertices, the estimation 
of its size is non-trivial. A very rough estimate can be obtained assuming that at the critical point half the formulas 
are SAT (according to the numerical results presented in the next section) and that the backbone size is for UNSAT 
formulas and 0.88 for SAT ones. Under these very crude hypothesis the backbone size would be 0.44, which in not 
too far from the numerical result (see next section). 

VI. NUMERICAL RESULTS 

We have performed extensive numerical experiments on both versions of hSAT in order to confirm analytical 
predictions and to compute quantities which are not accessible analytically. Beside the GF[2] polynomial method, 
we have also used two local algorithms, namely the Davis-Putnam (DP) complete backtrack search |2^ and the 
incomplete walk-SAT randomized heuristic search , to check the hardness of the problem for local search. The 
existence of at least one solution in the satisfiable hSAT allowed us to run walk-SAT in the whole range of 7, the 
halting criterion always being finding a SAT assignment. 
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FIG. 2. The probability that a formula is SAT as a function of the coupling density. Inset: The energy reached by a 
deterministic rule becomes different from zero at the dynamical critical point. 



The first set of results concerns the numerical determinations of the critical points of hSAT obtained by the 
polynomial method over large samples. 

For the frustrated case, the fraction of satisfiable instances drops down to zero at 7c — 0.918. In Fig. ^ we show 
this fraction as a function of 7, which has been obtained, for any size iV, counting the number of hyper-loops in 10^ 
different random hyper-graphs. For any given random hyper-graph the fraction of SAT formulas is given by 2^^'^' , 
where Nhi{'j) is the number of hyper-loops. The same set of simulations run on the satisfiable hSAT show that at 
exactly the same 7c, the model undergoes a discontinuous ferromagnetic transition. 

At 7d = 0.818 a dynamical transition takes place in both versions of hSAT. There appears an exponentially large 
number of positive energy local minima strongly affecting non-randomized dynamics, which is not able to overcome 
energy barriers. We can easily detect the dynamical transition by adopting the following deterministic algorithm 
as a probe and by checking where it stops converging to solutions. The algorithm exploits the only local source of 
correlations among variables that is fluctuations in connectivity. At each step, the algorithm chooses the variable 
with highest connectivity, fixes its value at random and it simplifies the formula ("unit clause propagation" pTf). 
As can been seen in the inset of Fig. ^ the energy reached running the above process on very large formulas {N = 
10^, 10"^, 10^) starts to deviate from zero at a value which is highly compatible with the analytical prediction 7^ — 0.818. 
Unfortunately the mathematical analysis of this kind of algorithm appears to be beyond our present skills due to 
the correlations induced into the simplified formulas by the particular choice of variables. For a simple random 
(connectivity independent) choice of the variable the algorithm can be analyzed along the lines of Ref. and a 
convergence can be proven up to 7 = 2/3, which is also a rigorous lower bound to the true critical density 7c. A 
rigorous upper bound is easily established by noticing that the probability for the satisfiability of a formula at fixed 
7 is bounded by the number of satisfying assignments, averaged over all formulas of length •jN. It follows 7c < 2 In 2 
(which is the so called annealed bound known in the statistical mechanics of disordered media) . 
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FIG. 3. Scaling function for the SAT probability. Lower inset: The 7 value where the first hyper-loop arises scales as A'^"^. 
Upper inset: The critical width undergoes a crossover from = 1 to = 2. The fitting curve is 3A/N + 0.74/\/iV, while the 
line is the asymptote 0.74/\/iV. 



We have performed standard finite size scaling analysis in order to determine the size of the critical window w{N) 
and the v exponent defined by 'w{N) oc N~^^" for large N . 

In a growing random hyper-graph as soon as the first hyper-loop arises the fraction of SAT formulas drops down to 

0. 5. We have measured the mean 7 value where this event takes place, jdN). Such value scales as 7c(A^) — 7c oc N~'^, 

1. e. its critical exponent is = 1 as expected for a discontinuous transition (see lower inset in Fig. y). 

However in the model there is also another source of pure statistical (not critical) fluctuations |Q . These fluctuations 
come from the fact that almost every formula can be modify by the addition (or deletion) of order ^/N clauses without 
changing its satisfiability. Therefore in the large N limit these purely statistical fluctuations will dominate the critical 
ones, leading to an exponent 1/ = 2 in the scaling of the SAT probability. In the upper inset in Fig. |^ we show the 
width of the critical region as a function of N, together with the best fit of the kind Ax~^ + Cx^^/'^. Notably the 
best fitting value for B is perfectly compatible with 1, giving more evidence to the crossover from critical fluctuations 
(1/ — 1) to statistical ones (i' — 2). 

In the main part of Fig. y we shown the scaling function for the SAT probability. Note that the value at criticality 
is equal to 0.5 up to the numerical precision. Slight deviations from perfect scaling appear in the 7 > 7c region. 
However scaling relations hold only close to the critical point and our data perfectly collapse in all the range where 
the SAT probability is between 0.2 and 0.8. 
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FIG. 4. The lowest lines are the analytical expressions for the entropy of the unfrustrated model. The numerical estimation 
(not reported) perfectly coincide. Dashed parts correspond to metastable states. The rest of the data (entropy in the main 
body and energy and backbone size in the inset) come from exhaustive enumeration of the ground states in the frustrated 
model and of first excited states in the unfrustrated one (only A'^ = 40, 60) and they coincide. 



The different kind of transition taking place at 7c in the two versions of hSAT is reflected in the behavior of 
their ground-state entropies 5(7) shown in Fig. ^. For 7 < 7c both entropies coincides and they have the analytical 
expression 5(7) = ln(2)(l — 7) up to 7c. For 7 > 7c, while the entropy of satisfiable hSAT decreases exponentially 
fast with 7 (the solutions are more and more concentrated around the superimposed one), in the frustrated version 
the entropy decreases more slowly with 7, indicating that the number of unsat assignments minimizing the energy 
remains exponentially large up to 7 ^ 7c. 

At the SAT/UNSAT transition the solution space acquires a backbone structure, with a finite fraction of the 
variables that take the same value in all the solutions. Above the critical threshold a similar structure characterizes 
the ground states. In the inset of Fig. ^ we report the results of exhaustive ground states enumeration on small 
systems, giving the average size of the backbone and the average energy per hyper-link. Increasing the system size, 
the average energy converges to zero for 7 < 7c and it becomes positive at 7c in a continuous way. The appearance 
of the backbone becomes sharper increasing the system size and, in the thermodynamical limit (N 00), we expect 
it to be zero for 7 < 7c and finite for "f > "fc, consistently with a random first order phase transition predicted by 
the replica theory. As can be seen in the inset of Fig. ^ the backbone size does not depend strongly on the system 
size in the UNSAT phase. As discussed in Ref. the presence of a finite backbone is conjectured to be the source of 
computational hardness in finding solutions at the SAT/UNSAT transition for both complete and randomized local 
algorithms. 

In the 7 > 7c region the backbone size shows clear oscillations, due to finite size effects. At fixed energy the 
backbone size is a non-decreasing function of 7, but it typically decreases when the energy jumps to a higher value. 
For finite systems such jumps, which are of order are particularly evident and induce observable fluctuations in 
the backbone. We expect these fluctuation to disappear in the thermodynamic limit. 

In satisfiable hSAT, once we consider only the lowest local minima configurations just above the zero energy solutions 
(the so called excited states) we find that they share completely the same statistical properties with the ground states 
of the corresponding frustrated hSAT, i.e. the model defined over the same random hyper-graph with randomized 
couplings. We have performed a set of DP runs in satisfiable hSAT similar to the ones used previously, with the 
additional requirement of not considering the superimposed solution. The backbone size, the average energy and 
the entropy of the excited states just above the solution are identical to those measured on the ground states of the 
frustrated version (see Fig. ^). These results, together with some preliminary analytical findings js^, show in detail 
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why a model without quenched frustration behave and can be modeled as having random sign interactions, i.e. like a 
spin glass model. Such a mapping is believed to play a particularly important role in spin glass theory of structural 
glasses, in which the only source of frustration is geometrical (i.e. dynamical). Once the Boltzmann temperature T 
is introduced in the model, the critical points of hSAT can be thought of as zero temperature limits of critical lines 
in the (T, 7) plane. In spite of the absence of any static frustration and of the existence of a pure "crystalline" state 
(the spin configurations corresponding to the satisfying assignment), the spin system undergoes several dynamical 
and static transitions as the temperature is lowered. Both the crystalline state and the first excited states are never 
reached in any sub-exponential time and the system stays for very large times in the metastable states (the same 
happens in the frustrated version). 
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FIG. 5. The computational costs for finding a solution or proving unsatisfiability with the Davis-Putnam algorithm strongly 
increase approaching the critical point. For 7 > 7c they grow exponentially with the problem size A^. Inset: The same 
computational costs for the walk-SAT algorithm, which can be run for every 7 in the satisfiable model [N — 25, 50, 75, 100). 

In Fig. ^ we report data concerning the computational costs for finding a solution in the satisfiable hSAT and for 
proving satisfiability for the frustrated hSAT ||3^. For both algorithms (DP and walk-SAT) and in the whole range 
of 7, we have measured the logarithm of the running time averaged over thousands of samples of different sizes. The 
choice of analyzing the averaged logarithm instead of the logarithm of the average is dictated by the presence of 
fat tails in the running time distributions, even in the 7 < 7c region. The averaged logarithm provides directly the 
information on the most probable cost. 

The main body of the figure displays the DP computational costs for proving satisfiability in hSAT and for finding 
the satisfying assignment in the satisfiable hSAT (given the same underlying hyper-graph structure). Both costs show 
a sharp easy-hard transition at 7,,, where an enormous increase in the typical running times take place. For 7 < 7c 
both costs obviously coincide and they increase as a power law of N, the only effect of 7^ being a change of the 
exponent from 1 to a large value which eventually diverges at 7c. For 7 > 7c, the computational costs remain very 
high, i.e. < ln[r(7)] >cx a{^)N , with an exponent a slowly decreasing as I/7 |p7| . 

In the inset of Fig. | we show the average logarithm of the running times needed by walk-SAT for finding a 
solution in the satisfiable hSAT model. Analogously to DP the walk-SAT costs undergo an easy-hard transition at 7c. 
Interestingly enough, above 7c the computational cost for finding solutions remains quite high and does not decrease 
as in DP, where the additional constraints act as a pruning strategy in the search process. In the hard satisfiable 
region standard heuristic algorithms, like walk-SAT, get stuck in local minima and they are not able to exploit the 
large number of constraints in order to reduce the searching space. In particular, the large scale structure {0{N)) of 
the hyper-loops makes them difhcult to be detected in polynomial time by a local search process which is dominated 
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by the exponential branching process arising at each step when the tentative choices for the variables are made. 
However, having at hand a model on which new heuristic algorithms can be tested, such a searching optimization can 
be hopefully pushed far forward. 

A thorough analysis of the dependence of computational costs on A'' gives the following overall picture. For 7 < 7d 
the cost is a linear function of N. For 7 e [jdjlc) the typical cost increases as a power law of N, with an exponent 
which should diverge in 7c. For 7 > 7c the costs are exponential in N. 

VII. CONCLUSIONS 

In this paper we have studied a model for the generation of random combinatorial problems, called hyper-SAT. 
In the context of theoretical computer science such a model is simply the completely balanced version of the famous 
K-SAT model, while in statistical physics it corresponds to a diluted p-spin model at zero temperature. We have 
studied two version of the model, a frustrated one and an unfrustrated one. 

Increasing the density of interactions, 7 = M/N, the model undergoes two transitions. The first one is of purely 
dynamical nature whereas the second one is static. Such phase transitions have a straightforward interpretation 
in terms of the structure of the underlying hyper-graphs, leading to a very simple connection between theoretical 
computer science and graph theory, and statistical physics of random systems. 

The locations of phase boundaries, can be computed exactly within the RS replica formalism, leading to 7^; = 0.818 
and 7c = 0.918. We expect the replica results to be computable also by more rigorous probabilistic methods. 

Exploiting a global solution method which is polynomial in the problem size, we have been able to study very 
large problems, determining with high precision critical points and critical exponent, and a cross-over from critical 
fluctuations to statistical ones has been measured. 

We have found that the computational costs for finding a solution to a typical problem or to prove that it is 
unsatisfiable using only local search methods undergoes an easy-hard transitions at 7^ and 7c. The growth of the 
costs with the problem size N is linear up to 7^, is polynomial in A'^ in the range 7d < 7 < 7c and finally it becomes 
exponential in N above 7c. The above scenario has been checked for both complete and incomplete local algorithms, 
thanks to the existence of an halting criterion in the unfrustrated version of hyper-SAT where at least one solution is 
guaranteed to exist. The use of this model as a benchmark for heuristic algorithms may result in a good improvement 
of their performances in the phase where many local minima are present. 

hSAT can be viewed as an intermediate model between 2-SAT and K-SAT {K > 2), which is exactly solvable and 
in which the presence of hidden solutions can be kept under some control. Hopefully, some of the results and methods 
of our analysis of hSAT can be extended to NP-complete problems. 
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